An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments

Andres, Alain; Villar-Rodriguez, Esther; Del Ser, Javier

doi:10.1007/978-3-031-14463-9_13

Alain Andres^11,12,
Esther Villar-Rodriguez¹¹ &
Javier Del Ser^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13480))

Included in the following conference series:

International Cross-Domain Conference for Machine Learning and Knowledge Extraction

1059 Accesses
2 Citations

Abstract

In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Advances reported in this area over time have tackled the exploration issue by proposing new algorithmic ideas to generate alternative mechanisms to measure the novelty. However, most efforts in this direction have overlooked the influence of different design choices and parameter settings that have also been introduced to improve the effect of the generated intrinsic bonus, forgetting the application of those choices to other intrinsic motivation techniques that may also benefit of them. Furthermore, some of those intrinsic methods are applied with different base reinforcement algorithms (e.g. PPO, IMPALA) and neural network architectures, being hard to fairly compare the provided results and the actual progress provided by each solution. The goal of this work is to stress on this crucial matter in reinforcement learning over hard exploration environments, exposing the variability and susceptibility of avant-garde intrinsic motivation techniques to diverse design factors. Ultimately, our experiments herein reported underscore the importance of a careful selection of these design aspects coupled with the exploration requirements of the environment and the task in question under the same setup, so that fair comparisons can be guaranteed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Depending on the task under consideration, the novelty can be associated to the very last performed action and/or the next state visited by the agent in the trajectory.
2.
Rollout is denoted as \(\tau \), whereas the i-th rollout is denoted as \(\tau _i\).
3.
We note that the choice of the neural network architecture is not just for the actor-critic modules, but also for IM approaches that hinge on neural computation.
4.
In this case, we take advantage of the 2D grid (discrete state space) and map each state directly to a dictionary when using COUNTS. Nevertheless, when facing more complex state spaces pseudo-counts [15] can be applied as an alternative as in [22].
5.
Even with different neural architectures and base RL algorithms, they successfully solve the same tasks in MiniGrid.
6.
We note that the number of parameters is slightly increased, but they also differ in the type of layers that are used in each network (the two-headed network uses CNNs while the independent actor-critic only uses dense layers).

References

Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv:1909.07528 (2019)
Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extr. 1(1), 1–20 (2019)
Google Scholar
Aubret, A., Matignon, L., Hassas, S.: A survey on intrinsic motivation in reinforcement learning. arXiv:1908.06976 (2019)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization (2016)
Google Scholar
Grigorescu, D.: Curiosity, intrinsic motivation and the pleasure of knowledge. J. Educ. Sci. Psychol. 10(1) (2020)
Google Scholar
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv:2002.12292 (2020)
Badia, A.P., et al.: Never give up: learning directed exploration strategies. arXiv:2002.06038 (2020)
Flet-Berliac, Y., Ferret, J., Pietquin, O., Preux, P., Geist, M.: Adversarially guided actor-critic. arXiv:2102.04376 (2021)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787 (2017)
Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv:1810.12894 (2018)
Andrychowicz, M., et al.: What matters in on-policy reinforcement learning? A large-scale empirical study. arXiv:2006.05990 (2020)
Andrychowicz, M., et al.: What matters for on-policy deep actor-critic methods? A large-scale study. In: International Conference on Learning Representations (2020)
Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)
Google Scholar
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 5125–5133 (2020)
Google Scholar
Pîslar, M., Szepesvari, D., Ostrovski, G., Borsa, D., Schaul, T.: When should agents explore? arXiv:2108.11811 (2021)
Zhang, T., et al.: NovelD: a simple yet effective exploration criterion. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Bougie, N., Ichise, R.: Fast and slow curiosity for high-level exploration in reinforcement learning. Appl. Intell. 51(2), 1086–1107 (2020). https://doi.org/10.1007/s10489-020-01849-3
Article MATH Google Scholar
Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with amigo: adversarially motivated intrinsic goals. arXiv:2006.12122 (2020)
Taiga, A.A., Fedus, W., Machado, M.C., Courville, A., Bellemare, M.G.: On bonus-based exploration methods in the arcade learning environment. arXiv:2109.11052 (2021)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., Efros, A.A.: Large-scale study of curiosity-driven learning. In: ICLR (2019)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Orsini, M., et al.: What matters for adversarial imitation learning? In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Jing, X., et al.: Divide and explore: multi-agent separate exploration with shared intrinsic motivations (2022)
Google Scholar
Seurin, M., Strub, F., Preux, P., Pietquin, O.: Don’t do what doesn’t matter: intrinsic motivation with action usefulness. arXiv:2105.09992 (2021)
Zha, D., Ma, W., Yuan, L., Hu, X., Liu, J.: Rank the episodes: a simple approach for exploration in procedurally-generated environments. arXiv:2101.08152 (2021)
Espeholt, L., et al.: IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1407–1416 (2018)
Google Scholar
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym. http://github.com/maximecb/gym-minigrid (2018)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438 (2015)

Download references

Acknowledgments

A. Andres and J. Del Ser would like to thank the Basque Government for its funding support through the research group MATHMODE (T1294-19) and the BIKAINTEK PhD support program.

Author information

Authors and Affiliations

TECNALIA, Basque Research and Technology Alliance (BRTA), 48160, Derio, Spain
Alain Andres, Esther Villar-Rodriguez & Javier Del Ser
University of the Basque Country (UPV/EHU), 48013, Bilbao, Spain
Alain Andres & Javier Del Ser

Authors

Alain Andres
View author publications
You can also search for this author in PubMed Google Scholar
Esther Villar-Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Del Ser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alain Andres .

Editor information

Editors and Affiliations

University of Natural Resources and Life Sciences Vienna, Vienna, Austria
Andreas Holzinger
St. Pölten University of Applied Sciences, St. Pölten, Austria
Peter Kieseberg
TU Wien, Vienna, Austria
A Min Tjoa
SBA Research, Vienna, Austria
Edgar Weippl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andres, A., Villar-Rodriguez, E., Del Ser, J. (2022). An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-14463-9_13
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments